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Introduction 

A  striking  feature  of  the  pathobiology  of  human  immunodeficiency 
virus  (HIV)  is  the  long  and  variable  incubation  period  between  initial 
viral  infection  and  the  emergence  of  acquired  immunodeficiency  syndrome 
(AIDS).   Data  on  the  duration  of  the  HIV  incubation  period  have  now  been 
reported  separately  from  three  sources:   homosexual  men  (1), 
hemophiliacs  (2),  and  transfusion  recipients  (3,  A).   In  the  present 
paper,  I  combine  the  evidence  from  these  three  sources  to  produce  an 
overall  statistical  description  of  the  HIV  incubation  period.   My 
statistical  method  closely  parallels  that  of  DuMouchel  and  Harris  (5)  , 
who  combined  the  results  of  various  dose-response  experiments  in  the 
assessment  of  human  cancer  risVcs. 

Data 

Fig.  1  shows  the  proportion  of  HIV-infected  homosexual  men  who 
have  contracted  AIDS  by  a  given  time  after  HIV  seroconversion.   Also 
shown  are  the  68  percent  confidence  limits  around  each  step  in  the 
estimated  cumulative  distribution.   The  estimates  in  Fig.  1  were  derived 
by  the  Kaplan-Meier  technique  on  data  from  155  men  followed  in  the  San 
Francisco  Clinic  Cohort  (1) . 

Fig.  2  shows  the  corresponding  cumulative  distribution,  along  with 
68  percent  confidence  limits,  for  HIV-infected  adult  hemophiliacs.   The 
estimates  were  derived  by  the  Kaplan-Meier  technique  on  data  from  40 
subjects  followed  at  the  Hemophilia  Center  of  Central  Pennsylvania  (2). 

Fig.  3  shows  the  proportion  of  transfusion  recipients  who  have 
AIDS  by  a  given  time  after  transfusion  with  HIV-infected  blood.   The 
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figure  has  been  estimated  by  a  nonparametric  maximum  likelihood 
technique  (6)  from  data  on  297  transfusion-associated  AIDS  cases, 
reported  during  1978-1986,  for  which  the  date  of  transfusion  was  known 
(3).   The  cumulative  proportion  of  AIDS  cases  by  8  years  is  exactly  100% 
because  the  investigators  observed  only  the  transfusion- infected  cases 
that  have  been  so  far  diagnosed  with  AIDS. 

Statistical  Methods 

Figs.  1  through  3  provide  separate  estimates  of  the  cumulative 
distribution  function  for  the  incubation  period  of  HIV.   My  task  is  to 
combine  the  three  sources  of  data  to  produce  an  overall,  synthetic 
estimate  of  the  cumulative  distribution. 

The  nonparametric  estimates  in  Figs.  1  through  3  are  step 
functions  with  different  support  points  that  depend  upon  the  observed 
incubation  periods  of  the  subjects  under  study.   From  each  cumulative 
distribution,  however,  we  can  derive  the  estimated  incidence  of  AIDS  at 
evenly  spaced  intervals.   In  the  present  analysis,  I  work  with  the 
estimated  annual  incidence  rates  (and  their  sampling  errors)  derived 
from  the  three  studies. 

Let  q^j.  denote  the  observed  annual  incidence  of  AIDS  in  cohort  j 

during  year  t,  where  j  -  1,2,3  and  t  -  0,1,2 7.   That  is,  q.j^.  is  the 

estimated  probability  that  an  HIV-infected  person  in  cohort  j  will 
develop  symptomatic  AIDS  in  the  half-open  time  interval  [t,t+l),  given 
that  AIDS  was  not  diagnosed  during  [0,t). 

The  data  q.:^  are  estimates  of  the  actual  incidences  6.:^.   I  model 
the  sampling  errors  for  each  G.:_  as  follows: 
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log  qjt   -   log  Sj^   +   €j^  (1), 

where  the  ii^   are   joint  normally  distributed  with  zero  means  and  known 
variance-covariance  matrix.   This  assumption  was  motivated  by  the  fact 
that  the  data  in  Figs.  1  through  3  are  maximum  likelihood  estimates. 
Since  Eq .  1  is  in  logarithmic  form,  the  standard  deviations  of  the  error 
terms  c^^   can  be  interpreted  as  relative  standard  errors. 

I  further  assume  that  the  actual  incidences  6^^.  are  interrelated 
by  the  following  linear  model; 

log  Bjj.   -   ;i   +   Qj   +   7^   +   5j^  (2) 

where  n,    {qt,  09,02},  and  {-yn,  .  .  .  ,-y-i]    are  unknown  parameters,  and  where 
the  error  terms  S^^   are  indeoendentlv  and  identicallv  normally 
distributed  with  mean  zero  and  variance  a^ .   The  parameters  a^  are 
cohort  effects,  while  the  7^  are  time  effects.   By  Eq .  2,  the  relative 
incidence  rates  for  any  two  cohorts  are  approximately  independent  of 
duration  t  of  HIV  infection.    That  is,  for  any  two  cohorts  j  and  k  and 
any  year  t,  the  quantity  log  6-;^  ~  1°S  ®kt  ^^^  time- independent 
a^    —   Q^  and  variance  2a^ .      In  essence,  the  parameter  a   gauges  the 
overall  accuracy  of  approximation  to  this  "constant  relative  risk" 
model.   The  critical  statistical  assumption  here  is  that  the 
approximation  errors  5.:^  and  6-^^   have  independent,  identical 
distributions.   This  is  what  DuMouchel  and  Harris  (5)  called  the 
"exchangeability  assumption." 


mean 
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From  Eqs.  1  and  2,  the  overall  model  is 

log  qj^   -   /J   +  Qj   +   7t   +   £jt   +   '5jt         ^^^• 

If  one  has  strong  prior  information,  then  all  the  parameters  fi ,    a,    7  and 
a   can  be  estimated  by  Bayesian  techniques  (5).   Here,  I  report  only  the 
classical  maximum  likelihood  estimates.   Accordingly,  only  the 
parameters  ^i   and  a   and  the  contrasts  oj    —   qt  (for  j  -  2  and  3)  and 
7^  —  7q  (for  t  -  1,...,7)  are  separately  identified. 

The  incidence  data  for  homosexual  men  (j  -  1)  and  hemophiliacs  (j 
-=  2)  are  derived  from  cohorts  of  HIV-seropositive  subjects.   By 
contrast,  the  estimates  for  transfusion  recipients  (j  -  3)  are  derived 
solely  from  those  recipients  who  have  so  far  developed  AIDS.   Tnat  is, 
each  q^^  is  an  estimate  of  G^^/^,  where  ^   is  the  proportion  of  all 
persons  receiving  HIV-infected  blood  who  will  develop  A.IDS  by  8  years. 
Accordingly,  for  the  transfusion  recipients  (j  -3),  Eq .  2  needs  to  be 
replaced  by 

log  63^  -  ^  +   log  $  +  Q3   +  7^.  +  6^^  ('+)  . 

In  a  Bayesian  framework,  prior  information  on  $  might  permit  us  to 
estimate  this  probability  separately.   In  a  classical  framework, 
however,  only  the  contrast  log  $  +  03  —  qt  can  be  identified  (7). 
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Results 

The  incubation  periods  for  the  three  data  sources  showed  a  close 
fit  to  the  model  of  Eq.  3.   The  likelihood  function  for  the  parameter  a 
displayed  an  essentially  flat  top  in  the  range  0  <  a  <   0.05.   That  is, 
with  68%  probability,  the  assumption  of  constant  relative  incidence 
rates  is  accurate  to  within  5  percent. 

Moreover,  I  could  not  reject  the  hypothesis  that  the  annual 
incidence  rates  for  HIV-infected  homosexual  men  (cohort  1)  equalled 
those  for  HIV-infected  hemophiliacs  (cohort  2)  are  equal.   Accordingly, 
I  shall  present  the  estimated  synthetic  incidence  rates  under  the 
restriction  Q]_  -  0:2  • 

Table  1  shows  the  estimated  annual  incidence  rates  of  AIDS  during 
the  years  0  through  7  after  KIV  seroconversion.   The  left-hand  column 
shovs  the  original  data  C',  ^   from  the  San  Francisco  Clinic  Cohort.   The 
right-hand  coluEn  shows  the  corresponding  estimates  of  ~i~-      The  latter 
show  the  incidence  of  AIDS  rising  from  0.4%  in  the  year  immediately 
following  HIV  infection  to  12.9%  during  year  7. 

In  computing  the  incidence  rates  6-1-  for  homosexual  men,  the 
statistical  procedure  essential  borrows  information  from  the  other  two 
cohorts.   Hence,  the  relative  standard  deviations  for  the  synthetic 
estimates  Q-^t   -^^  Table  1  are  smaller  than  those  of  the  original  data 
q-j^^.   Still,  the  synthetic  estimates  are  not  too  precise.   The  estimated 
incidence  rate  of  .129  during  year  7  has  a  68  percent  confidence 
interval  of  .092  to  .180.   Accordingly,  while  we  can  be  reasonably  sure 
that  the  incidence  of  AIDS  rises  during  years  0  through  5  after  HIV 
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infection,  we  cannot  confidently  conclude  that  it  continues  to  rise 
during  years  6  and  7. 

Having  obtained  the  estimates  B-^^,    we  can  compute  a  synthetic 
cumulative  distribution  function.   This  is  done  in  Fig.  4,  where  the 
filled  circles  show  the  estimated  cumulative  proportion  with  AIDS  at  the 
start  of  each  year.   At  the  8th  anniversary  of  HIV  infection,  an 
estimated  41.2%  have  AIDS,  with  68  percent  confidence  interval  of  36.2% 
to  45.8%.  '  ' 

In  this  study,  we  have  no  direct  observations  on  the  incidence  of 
AIDS  past  the  8th  anniversary  of  HIV  infection.   Still,  we  can  project 
the  incidence  beyond  that  point  if  we  are  willing  to  make  parametric 
assuEptions .   As  Fig.  4  shows,  the  synthetic  estimates  are  consistent 

with  a  two-parameter  weibull  cumulative  distribution  function  F(t)  - 

1  -(0   09t")^   •       -  ...       .         .,,_,. 

1  —  e  v"---'^'-/  ^  wnere  t  is  now  consiaerec  a   continuous  variaoie.   inis 

two -parameter  V'eibull  model  predicts  that  50%  of  HIV-infected  subjects 

would  have  AIDS  by  9.25  years.   By  the  15th  anniversary,  84%  would  have 

AIDS.   The  data  in  Fig.  4  are  insufficient  to  distinguish  between  this 

two-parameter  Weibull  model,  in  which  everyone  who  is  infected  would 

eventually  develop  AIDS,  and  a  three-parameter  model  in  which  some 

unknown  proportion  are  forever  AIDS -free. 

Discussion 

The  duration  of  the  incubation  period  for  HIV  may  be  determined  by 
many  factors,  including  the  initial  dose  of  the  virus;  the  initial  port 
of  entry;  the  site  of  latent  infection;  the  initial  or  subsequent  state 
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of  the  immune  response;  the  infected  person's  age;  coinfection  with 
other  viruses;  and  reinfection  by  other  strains  of  HIV.   The  current 
study  has  detected  no  striking  differences  in  the  distribution  of  HIV 
incubation  among  homosexual  men  and  two  other  groups  who  presumably 
received  the  virus  through  the  blood-borne  route.   Some  men  in  the  San 
Francisco  Clinic  Cohort  may  have  been  infected  by  sharing  of  HIV- 
contaminated  needles.   Still,  the  evidence  so  far  does  not  reveal  a 
difference  in  the  duration  of  the  HIV  incubation  period  between  those 
infected  by  the  blood-borne  route  and  those  infected  by  sexual 
transmission.   If  there  are  as  yet  unidentified  "cofactors"  that 
influence  the  natural  history  of  HIV  infection,  the  present  results 
suggest  that  such  cofactors  are  common  to  diverse  risk  groups. 

This  study  focused  on  HIV-infected  adults.   Children  with 
hemophilia  may  have  a  different  incubation  curve  (2) ,  and  have  been 
excluded  from  the  present  analysis.   In  future  research,  we  also  need  to 
analyze  separately  infants  who  have  received  infected  blood  transfusions 
(3)  and  newborns  who  have  acquired  HIV  by  vertical  transmission. 

Tne  data  in  Fig.  4  fit  quite  closely  to  a  parametric  Weibull 
distribution  of  the  form  F(t)  -  1  -  e"^^^-'  .   A  reasonably  good  Weibull 
fit  has  also  been  reported  for  the  transfusion  data  alone,  though  some 
authors  have  suggested  the  gamma,  log- logistic ,  normal  and  log-normal 
distributions  (3,  A,  8).   In  the  absence  of  direct  observations  on  AIDS 
incidence  rates  for  persons  infected  for  over  8  years,  such  parametric 
models  need  to  be  viewed  mainly  as  within- sample  smoothing  devices. 
Their  use  in  predicting  the  incidence  of  AIDS  in  the  second  decade  after 
HIV  infection  needs  to  be  cautious  and  qualified. 
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Still,  it  is  interesting  that  the  cumulative  hazard  of  AIDS 
appears  to  rise  with  the  square  of  time  since  initial  HIV  infection. 
This  mathematical  form  arises  in  reliability  theory  when  a  device 
consists  of  a  large  number  of  paired  elements.   Any  single  pair  "fails" 
when  both  of  its  elements  have  failed,  and  the  device  fails  as  soon  as 
the  first  pair  fails  (9) .   If  there  are  M  pairs  and  if  each  element  of 
each  pair  has  a  constant  failure  rate  u),  then  the  cumulative 
distribution  of  time  to  failure  is  F(t)  -  1  —  e~^^^^  ,  where  p   -   mJW. 

The  biological  analogy  would  be  that  HIV  initially  infects  a  large 
number  of  host  cells  (e.g.,  l},Tnphocytes ,  macrophages,  Langerhans  cells). 
AIDS  would  become  manifest  after  any  single  infected  cell  undergoes  two 
separate  changes  (e.g.,  coinfection  with  another  virus,  activation  by 
another  foreign  antigen,  production  of  cytotoxic  proteins).   We  have 
estimated  p  -  0.09,  so  if  M  -  100  cells  were  initially  infected,  then 
each  of  the  two  subsequent  steps  would  have  a  transition  rate  of  w  - 
0.009  per  year. 
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Table  1.   Estimated  annual  incidence  of  AIDS  during  years  0  through  7 
after  HIV  seroconversion.* 


Year  San  Francisco  Synthetic 

Clinic  Cohort  Estimate 


0.000  0.004 

(0.364) 

0.000  0.024 

(0.273) 

0.050  0.038 

(0.300)  (0.211) 

0.053  0.059 

(0.411)  (0.246) 

0.056  0.061 

(0.370)  (0.256) 

0.106  0.119 

(0.250)  (0.201) 

0.079  0.096 

(0.523)  (0.403) 

0.066  0.129 

(0.474)  (0.397) 


*The  numbers  in  parentheses  are  the  relative  standard  errors.   The  San 
Francisco  Clinic  Cohort  data  are  fron:  (1) .   The  Synthetic  Estimates  are 
the  maximum  likelihood  estimates  from  three  data  sources  (homosexual 
men,  hemophiliacs,  transfusion  recipients)  based  on  the  model  of  Eq.  3 
with  the  restriction  Q]_  -  q2  • 
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CAPTIONS  TO  FIGURES 

Fig.  1.   Estimated  cumulative  proportion  of  homosexual  men  with  AIDS  in 
relation  to  the  time  since  HIV  seroconversion.   The  error  bars  show  the 
estimated  68  percent  confidence  intervals  (±1  standard  error) .   The  data 
are  from  (1) . 

Fig.  2.   Estimated  cumulative  proportion  of  adult  hemophiliacs  (aged  21 
and  over)  with  AIDS  in  relation  to  the  time  since  HIV  seroconversion. 
The  error  bars  show  the  estimated  68  percent  confidence  intervals  (±1 
standard  error) .   The  data  are  from  (2)  . 

Fig.  3.   Estimated  cumulative  proportion  cf  transfusion  recipients  with 
A.IDS  in  relation  to  the  time  since  administration  of  HIV-infected  blood. 
Tne  error  bars  show  the  estimated  68  percent  confidence  intervals  (±1 
standard  error) .   A  nonparametric  maximum  likelihood  technique  (5)  was 
applied  to  the  data  reported  in  (3)  . 

Fig.  4.   Estimated  cumulative  proportion  of  HIV-infected  persons  with 
AIDS  in  relation  to  the  time  since  HIV  seroconverion.   The  closed 
circles  (along  with  68  percent  confidence  intervals)  are  the  combined 
estimates  derived  from  Table  1.   The  continuous  curve  is  the  Weibull 
distribution  F(t)  -  1  -  e~^°-°^'^)\ 
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